27 research outputs found
FSNet: An Identity-Aware Generative Model for Image-based Face Swapping
This paper presents FSNet, a deep generative model for image-based face
swapping. Traditionally, face-swapping methods are based on three-dimensional
morphable models (3DMMs), and facial textures are replaced between the
estimated three-dimensional (3D) geometries in two images of different
individuals. However, the estimation of 3D geometries along with different
lighting conditions using 3DMMs is still a difficult task. We herein represent
the face region with a latent variable that is assigned with the proposed deep
neural network (DNN) instead of facial textures. The proposed DNN synthesizes a
face-swapped image using the latent variable of the face region and another
image of the non-face region. The proposed method is not required to fit to the
3DMM; additionally, it performs face swapping only by feeding two face images
to the proposed network. Consequently, our DNN-based face swapping performs
better than previous approaches for challenging inputs with different face
orientations and lighting conditions. Through several experiments, we
demonstrated that the proposed method performs face swapping in a more stable
manner than the state-of-the-art method, and that its results are compatible
with the method thereof.Comment: 20pages, Asian Conference of Computer Vision 201
Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema
In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. © Springer Science+Business Media, LLC 2011
Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification
As an essential approach to understanding human interactions, emotion classification is a vital component of behavioral studies as well as being important in the design of context-aware systems. Recent studies have shown that speech contains rich information about emotion, and numerous speech-based emotion classification methods have been proposed. However, the classification performance is still short of what is desired for the algorithms to be used in real systems. We present an emotion classification system using several one-against-all support vector machines with a thresholding fusion mechanism to combine the individual outputs, which provides the functionality to effectively increase the emotion classification accuracy at the expense of rejecting some samples as unclassified. Results show that the proposed system outperforms three state-of-the-art methods and that the thresholding fusion mechanism can effectively improve the emotion classification, which is important for applications that require very high accuracy but do not require that all samples be classified. We evaluate the system performance for several challenging scenarios including speaker-independent tests, tests on noisy speech signals, and tests using non-professional acted recordings, in order to demonstrate the performance of the system and the effectiveness of the thresholding fusion mechanism in real scenarios.Peer ReviewedPreprin
Techniques for Mimicry and Identity Blending Using Morph Space PCA
We describe a face modelling tool allowing image representation in a high-dimensional morph space, compression to a small number of coefficients using PCA [1], and expression transfer between face models by projection of the source morph description (a parameterisation of complex facial motion) into the target morph space. This technique allows creation of an identity-blended avatar model whose high degree of realism enables diverse applications in visual psychophysics, stimulus generation for perceptual experiments, animation and affective computing. © 2013 Springer-Verlag